Spectral Methods for Learning Multivariate Latent Tree Structure
نویسندگان
چکیده
This work considers the problem of learning the structure of multivariate linear tree models, whichinclude a variety of directed tree graphical models with continuous, discrete, and mixed latent variablessuch as linear-Gaussian models, hidden Markov models, Gaussian mixture models, and Markov evolu-tionary trees. The setting is one where we only have samples from certain observed variables in the tree,and our goal is to estimate the tree structure (i.e., the graph of how the underlying hidden variables areconnected to each other and to the observed variables). We propose the Spectral Recursive Grouping al-gorithm, an efficient and simple bottom-up procedure for recovering the tree structure from independentsamples of the observed variables. Our finite sample size bounds for exact recovery of the tree structurereveal certain natural dependencies on underlying statistical and structural properties of the underlyingjoint distribution. Furthermore, our sample complexity guarantees have no explicit dependence on thedimensionality of the observed variables, making the algorithm applicable to many high-dimensional set-tings. At the heart of our algorithm is a spectral quartet test for determining the relative topology of aquartet of variables from second-order statistics.
منابع مشابه
Spectral Unsupervised Parsing with Additive Tree Metrics
We propose a spectral approach for unsupervised constituent parsing that comes with theoretical guarantees on latent structure recovery. Our approach is grammarless – we directly learn the bracketing structure of a given sentence without using a grammar model. The main algorithm is based on lifting the concept of additive tree metrics for structure learning of latent trees in the phylogenetic a...
متن کاملSpectral Learning of Large Structured HMMs for Comparative Epigenomics
We develop a latent variable model and an efficient spectral algorithm motivatedby the recent emergence of very large data sets of chromatin marks from multiplehuman cell types. A natural model for chromatin data in one cell type is a HiddenMarkov Model (HMM); we model the relationship between multiple cell types byconnecting their hidden states by a fixed tree of known structur...
متن کاملA Spectral Algorithm for Latent Tree Graphical Models
Latent variable models are powerful tools for probabilistic modeling, and have been successfully applied to various domains, such as speech analysis and bioinformatics. However, parameter learning algorithms for latent variable models have predominantly relied on local search heuristics such as expectation maximization (EM). We propose a fast, local-minimum-free spectral algorithm for learning ...
متن کاملScalable Latent Tree Model and its Application to Health Analytics
We present an integrated approach to structure and parameter estimation in latent tree graphical models, where some nodes are hidden. Our approach follows a “divide-and-conquer” strategy, and learns models over small groups of variables (where the grouping is obtained through preprocessing). A global solution is obtained in the end through simple merge steps. Our structure learning procedure in...
متن کاملSpectral Dependency Parsing with Latent Variables
Recently there has been substantial interest in using spectral methods to learn generative sequence models like HMMs. Spectral methods are attractive as they provide globally consistent estimates of the model parameters and are very fast and scalable, unlike EM methods, which can get stuck in local minima. In this paper, we present a novel extension of this class of spectral methods to learn de...
متن کامل